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ABSTRACT 

Counts-in-cells are measured in the rCDM Virgo Hubble Volume simulation. This 
large V-body experiment has 10 9 particles in a cubic box of size 2000 hr 1 Mpc. The 
unprecedented combination of size and resolution allows for the first time a realistic 
numerical analysis of the cosmic errors and cosmic correlations of statistics related to 
counts-in-cells measurements, such as the probability distribution function Pjy itself, 
its factorial moments Fk and the related cumulants £ and Sjv's. These statistics are 
extracted from the whole simulation cube, as well as from 4096 sub-cubes of size 
125 ft. _1 Mpc, each representing a virtual random realization of the local universe. 

The measurements and their scatter over the sub-volumes are compared to the 
theoretical predictions of Colombi, Bouchet & Schaeffer (1995) for Pq, and of Szapudi 
& Colombi (1996, SC) and Szapudi, Colombi & Bernardeau (1999a, SCB) for the 
factorial moments and the cumulants. The general behavior of experimental variance 
and cross-correlations as functions of scale and order is well described by theoretical 
predictions, with a few percent accuracy in the weakly non-linear regime for the cosmic 
error on factorial moments. On highly non-linear scales, however, all variants of the 
hierarchical model used by SC and SCB to describe clustering appear to become 
increasingly approximate, which leads to a slight overestimation of the error, by about 
a factor of two in the worst case. Because of the needed supplementary perturbative 
approach, the theory is less accurate for non-linear estimators, such as cumulants, 
than for factorial moments. 

The cosmic bias is evaluated as well, and, in agreement with SCB, is found to be 
insignificant compared to the cosmic variance in all regimes investigated. 

While higher order statistics were previously evaluated in several simulations, 
this work presents text book quality measurements of Sn's, 3 < N < 10, in an 
unprecedented dynamic range of 0.05 £ SS 50. In the weakly nonlinear regime the 
results confirm previous findings and agree remarkably well with perturbation theory 
predictions including the one loop corrections based on spherical collapse by Fosalba 
& Gaztahaga 1998. Extended perturbation theory is confirmed on all scales. 

Key words: large scale structure of the universe - galaxies: clustering - methods: 
numerical - methods: statistical 



1 INTRODUCTION 

Measurements of higher order statistics in galaxy catalogs 
test theories of structure formation, the nature of the ini- 
tial fluctuations, and the processes of galaxy formation. The 
power of such measurements to constrain theories, however, 
depends crucially on the detailed understanding of the er- 
rors. Usually it is tacitly assumed that the underlying dis- 
tribution of events is Gaussian and thus the term "errors" 
becomes synonymous with the "variance". Knowledge of 
the variance is sufficient only when the error distribution 
is Gaussian. 



For statistics related to counts-in-cells a rigorous theory 
for the cosmic errors was presented in a suite of papers by 
Szapudi & Colombi 1996, hereafter SC, Colombi, Szapudi 
& Szalay 1998, and Szapudi, Colombi & Bernardeau 1999a, 
hereafter SCB. Nevertheless these calculations relied on ap- 
proximations, for which the domain of validity could not 
be checked extensively until the arrival of the Virgo Hubble 
Volume Simulations. Moreover, the regime where the under- 
lying cosmic distribution is Gaussian could not be examined 
previously. This paper addresses the first problem by study- 
ing the statistical errors and cross-correlations numerically, 
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while a companion paper, Szapudi et al. (1999c, hereafter 
paper II), discusses the underlying distributions of statistics 
in their full splendour. 

Let us consider a statistic A measured in a galaxy cat- 
alog of volume V. The corresponding indicator is denoted 
by A. In practice, only one sample of our local universe is 
accessible. However, a frequentist numerical experiment can 
be performed in a large numerical simulation if a sufficient 
number Ce of galaxy catalogs £i can be extracted from it. 
In each of them a value Ai, 1 < i < Ce, can be measured. 

For any statistic A the cosmic distribution function 
T(^4) is the probability density of measuring the value A 
in a particular finite realization. This distribution function 
can be approximately extracted from the Ce subsamples un- 
der the ergodic hypothesis. For simplicity, we dispense with 
the (logical) notation T, and replace it in what follows with 
T. This expresses the fact that we do not wish to enter one 
more level of complexity by considering the "error on the 
error" problem (SC) in greater detail. The smoothness and 
regularity of our measurements suggest that the number of 
realizations, which represent a two orders of magnitude im- 
provement over any previous work, is large enough to pro- 
vide an adequate determination of the quantities measured. 

While in practice the function T(A) is the fundamental 
quantity underlying all measurements, this paper concen- 
trates on its first two moments; paper II examines its shape 
and skewness in detail. 

In the following definitions, integrals are to be under- 
stood as summations of the estimator over the distribution 
function. The first moment of T(A) is the spatial average 



AT(A)dA = (A) = A, 



(1) 



where it is assumed that the estimator A is unbiased. The 
bias is negligible compared to the relative cosmic error in 
most meaningful cases (SCB) as illustrated later by practical 
examples. For completeness, however, the definition of the 
cosmic bias is 



b A 



(A) -A 



(2) 



The second (centered) moment of the cosmic distribution is 
called the cosmic error, 



(A - A) 2 T{A)dA = ((A - A) 2 ) = (AA) 2 



(3) 



For a biased statistic, the variance should be centered 
around the biased average and not the true value. It can 
however be shown formally (SCB) that the above definition 
is valid to second order in AA/A for any biased statistic.^ 

Finally, the cosmic covariance can be defined analo- 
gously to the variance as {(A — A)(B — B)). 

The theoretical results for the errors and cross- 
correlations are summarized below. If v and V are the cell 
and catalog volumes respectively, the cosmic error can be 
approximately separated into three components to leading 
order in v/V (SC): 



* More precisely, to first order in — Xi)(xj — Xj)) where £j 
denote the unbiased estimators from which A is constructed in a 
non-linear fashion. 



• The discreteness or shot noise error which is due to 
the finite number of objects A^bj in the catalog. It increases 
towards small scales and with the order of the statistics con- 
sidered, but becomes negligible when N ohi is very large. 

• The edge effect error is due to the uneven weight given 
to galaxies near the edges of the survey compared to those 
near the centre. It is especially significant on large scales, 
comparable to the size of the catalog. 

• The finite volume error is due to fluctuations of the un- 
derlying density field on scales larger than the characteristic 
size of the catalog. 

The next to leading order correction in v/V is proportional 
to the perimeter of the catalog dV. At this level of accuracy 
there are also correlations between the three sources of error 
(e.g., Colombi et al. 1999, hereafter CCDFS). 

Colombi, Bouchet & Schaeffer (1995, hereafter CBS) 
investigated in detail the cosmic error on the void prob- 
ability function. The groundwork for error calculations of 
statistics related to counts-in-cells is based on SC where the 
cosmic error for factorial moments^] was evaluated analyti- 
cally. SCB, extended the work of SC to cross-correlations, 
including perturbation theory predictions (e.g., Bernardeau 
1996). The cosmic errors, biases (see also Hui & Gaztanaga 
1998, hereafter HG) and covariances for cumulants^ £ and 
Sn were calculated as well. The main goal of this paper is to 
compare the analytical predictions of CBS, SC and SCB to 
measurements made in the VIRGO rCDM Hubble Volume 
simulation. 

The exhaustive nature of the comparison that follows 
warrants the questions: is it meaningful to thrive for the 
detailed numerical understanding of the theory? How much 
of it is practically useful? Can it accurately estimate the 
errors on measurements in future surveys? While some of 
these questions were addressed in SCB, a brief account of 
supporting arguments is given next. 

The analytics do take into account all possible theo- 
retical errors, but systematics, such as those resulting from 
cut out holes, incompleteness from fiber separation, possi- 
ble magnitude errors in the case of the 2dF, etc., could in 
principle corrupt the theory and introduce biases. These ef- 
fects might even require detailed simulation of the survey. 
In the case of the UKST and Stromlo surveys such simu- 
lations were performed and compared with the predictions: 
the spectacular agreement surprised even the present au- 
thors (Hoyle, Szapudi, & Baugh 1999). Thus systematics do 
not dominate in all surveys; for another example, where cut 
out holes found to have insignificant effect on the cosmic 
probability distribution of the two-point correlations func- 
tion see Kerscher, Szapudi, & Szalay (1999). 

Moreover, the wide theoretical framework is flexible 
enough to incorporate all systematics, which have the ef- 
fect of altering certain parameters, such as the factorial mo- 
ments. In such case any bias can be corrected for. 

There might be unforeseen systematics which have such 
complicated non-linear effect that cannot even be modelled 
by appropriate alteration of a set of parameters. While it 
would be difficult to anticipate whether these could domi- 
nate for a particular survey, it is still instructive to inves- 



t e.g., Appendix A for definitions and notations. 
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tigate the potential results in an ideal case, especially dur- 
ing design phase of the survey. Error calculations help op- 
timizing geometry, sampling, and other parameters. During 
the design of the VIRMOS survey such considerations were 
taken into account (Colombi et al. 1999). These calculations 
as well as maximum likelihood analyses need to explore such 
a large region in parameter space, that they would typically 
be impractical to carry out with simulations. 

In addition to applications to surveys, the theory can 
be applied reliably to assess significance of measurements in 
simulations where multiple runs would be too costly (e.g., 
Szapudi et al. 1999d) . All these present and potential future 
applications motivate the detailed investigations performed 
in this article. 

The exposition is organized as follows. § 2 describes 
the TV-body data used for the purpose of our study. § 3 
analyses the count-in-cells distribution function Pn, its cu- 
mulants £ and Sn's, and the scaling function of the void 
probability distribution a = — ln(Po)/-Pi- These quantities 
are measured in the full simulation as well as in Ce = 4096 
subsamples. The accuracy of simulation is assessed by com- 
paring the measurements to the non-linear Ansatz of Hamil- 
ton et al. (1991) improved by Peacock and Dodds (1996, 
hereafter PD), and to perturbation theory predictions (here- 
after PT). The model of Fosalba & Gaztaiiaga (1998) and 
extended perturbation theory (hereafter EPT, see Colombi, 
Bernardeau, Bouchet & Hernquist 1997) are considered as 
well. § 4 extends these investigations to the cosmic error 
and the variance of the cosmic distribution function. A pre- 
liminary investigation of the cross-correlations is done for 
factorial moments and cumulants. The measurements are 
compared where possible to the theoretical predictions of 
SC, SCB and CBS, including extended perturbation the- 
ory. Finally § 5 recapitulates the results and discusses their 
implications. In addition, Appendix A gives a summary of 
the definitions and notations used in this paper for counts- 
in-cells statistics. It will be useful for the reader unfamiliar 
with these concepts. 



2 THE TV-BODY DATA 

The rCDM Hubble volume simulation (e.g., Evrard et 
al. 1999) was carried out using a parallel P 3 M code de- 
scribed in MacFarland et al. (1998). The code was run on 
512 processors of the Cray T3E-600 at the Rechenzentrum 
in Garching. 

Initial conditions were laid down by imposing perturba- 
tions on an initially uniform state represented by a "glass" 
distribution of particles generated by the method of White 
(1996). Because of the size of the simulation, a glass file of 
10 6 particles was tiled 10 times in each direction. As the ini- 
tial glass file was created with periodic boundary conditions 
tiling does not create any non-uniformities at the interface 
between the tiles. 

A Gaussian random density field was set up by perturb- 
ing the positions of the particles and assigning velocities to 
them according to the growing mode linear theory solutions, 
using the algorithm described by Efstathiou et al. (1985). In- 
dividual modes were assigned random phases and the power 
for each mode was selected at random from an exponential 
distribution with mean power corresponding to the desired 



power spectrum (j^j). Unlike Efstathiou et al. (1985), how- 
ever, the initial velocities were set up exactly proportional to 
the initial displacements, according to the Zel'dovich (1970) 
approximation. As shown by Scoccimarro (1998) this leads 
to larger initial transients. To compensate for this the sim- 
ulation was started at a high redshift, z — 29. 

The cosmological model used for the simulation tCDM 
is described in more detail in Jenkins et al. (1998). The 
approximation to the linear CDM power spectrum (Bond & 
Efstathiou 1984) was used 



<l*2l> = 



Ah 



1 + [aq + (6g) 3 / 2 + (cq) 2 



2/v 



(4) 



where q = k/T, a = 6.4 ft" 1 Mpc, b = 3 h' 1 Mpc, c = 
1.7 h~ x Mpc and v — 1.13. The value of F was set equal to 
0.21. The normalization constant, A, is chosen by fixing the 
value of <rf (the linear variance of the matter distribution in 
a sphere of radius 8 h^ 1 Mpc at z = 0). A value of as = 0.6 
was motivated by estimates based on cluster abundances 
(White, Efstathiou & Frenk 1993; Eke, Cole & Frenk 1996). 

The simulation was integrated using a leapfrog scheme 
as described in Hockney & Eastwood (1981), section 11- 
4-3. The simulation was completed in 500 equal steps in 
time. The softening used was 100 kpc/h comoving Plummer 
equivalent - see Jenkins et al. (1998). 



3 COUNTS-IN-CELLS ANALYSIS : THE 
UNDERLYING STATISTICS 

The count probability distribution function (CPDF) Pn is 
defined as the probability of finding N objects in a cell 
of volume v thrown at random in the catalog. CPDF was 
measured in the whole simulation £ for cubic cells of size 
Lbox/512 < I < Lbox/8, where L box = 2000 h' 1 Mpc is the 
size of the simulation cube (see Table |l|). Then the simula- 
tion cube was divided into 16 3 contiguous cubic subsamples 
Si of size L = 125 /i" 1 Mpc. Pn was evaluated in each 
of these for i/512 < I < L/2 (see Table |). The succes- 
sive convolution algorithm of Szapudi et al. (1999d, here- 
after SQSL) allowed the determination of the CPDF on all 
scales simultaneously in only a few minutes of CPU on a 
workstation^] with 512 3 sampling cells. The accuracy is thus 
Pn > fmin.i = 1/512 3 ~ 7.45 x 10~ 9 for the measurement 
in £ and for each individual £ i ; the accuracy increases by av- 
eraging over all subsamples: Pn > -P m in,2 = 1/(512 x 16) 3 ~ 
1.82 x 10~ 12 . For 4 ^ I <> 63 h' 1 Mpc the measurements 
in £ and £i overlap (Table |l]). This is illustrated by Fig. [jj 
displaying Pn as a function of N: the figure presents the 
CPDF extracted from both the full cube and averaged over 
all the sub-cubes. In the overlap region, the difference can be 
detected as slight irregularities of the high-A r tail from the 
full cube measurements. The figure suggests that at least on 
the smallest scales considered in £ (or each £;), our sam- 
pling is probably insufficient by the standards of SC. How- 
ever, this does not affect significantly the calculations as 
indicated by the agreement of the moments measured in £ 



■f This estimate does not include the reading in of the file. 
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Figure 1. The measured CPDF as a function of N. Various scales 
are plotted as described in the text and in Table [l]. The curves 
shift to the right as £ increases. 

and those calculated from averages obtained from the sub- 
samples. Therefore measurement errors will be neglected in 
what follows, i.e. infinite sampling is assumed. Note that 
this ideal can be achieved in practice for two-dimensional 
and small three-dimensional catalogs via the algorithm of 
Szapudi (1998), however, the present simulation is too large 
for this method. 

The smallest scale considered is only 2.4 times larger 
than the softening length A e = 100 h~ kpc. As discussed 
extensively in Colombi, Bouchet & Hernquist (1996), con- 
tamination by softening restricts the validity of the simula- 
tion on small scales. For spherical cells of radius R, at least 
R <; 4A E should hold. For the cubic cells of the present simu- 
lation this condition translates to I J; 6.5A e ~ 0.65 hT 1 Mpc. 
Thus the two smallest cell sizes i.e. the two leftmost points 
could be contaminated by softening, a fact that should be 
borne in mind, especially when comparing with theoretical 
calculations which employ models motivated by dynamics. 
On the other hand, for statistical purposes the dynamics 
can be ignored and the simulation can be regarded as a set 
with prescribed statistics. Then the possible contamination 
is irrelevant at the level of the approximations taken in the 
next sections. 

Another possible source of contamination could be in 
principle the anticorrelation introduced by the glass initial 
positions. The effect of this is, however, extremely small as 
evidenced by the measurement of £ shown below. 

Figure H displays the average correlation function £ as 
a function of scale. By definition 



r 2 



(5) 



where £(r) is the two-point correlation function. In practice, 
it is obtained as the variance of the counts-in-cells, corrected 
for discreteness effects automatically via the use of factorial 
moments (e.g., see SQSL and Appendix A for the detailed 
description of the method used in this paper to obtain the 
cumulants including the variance from counts-in-cells). The 
measured £ is compared with linear theory (dots) and with 
the non-linear Ansatz of Hamilton et al. (1991) improved 
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Figure 2. The averaged two-point correlation function £ as a 
function of scale. It is compared with linear theory (dots) and 
with the non-linear Ansatz of Hamilton et al. (1991) with the 
recipe of Peacock Sz Dodds (1996) (dashes). The open symbols 
correspond to the £ obtained from the CPDF averaged over all 
the subsamples Si and the filled symbols to the measurement in 
S. 



by PD (dashes). As expected, the agreement with linear 
theory in the regime £ ,SS 1 is excellent, even on the largest 
scales where the anticorrelations introduced by the glass ini- 
tial condition could cause contamination. The two leftmost 
points are slightly below the dashes, because of softening ef- 
fects as discussed above, otherwise the results are in perfect 
accord with theory. 

Figure ^ plots the extracted cumulants Six's against £. 
They are compared with predictions of various models, in- 
cluding perturbation theory (PT, dots). By definition (e.g., 
Balian & Schaeffer 1989a) 



Sn = N 



(6) 



where £ N is the Appoint correlation function averaged over 
a cell: 



d 3 n 



■ d rjv£jv(n, • ■ ■ , rjv). 



(7) 



Perturbation theory predictions have been calculated for 
spherical cells by Juszkiewicz, Bouchet & Colombi (1993) for 
5*3 and extended to arbitrary order by Bernardeau (1994): 



Sn{£) = /iv(7i,- • • i7jv-2), 
_ d*logf 



For example 
34 

S3 = — +71, 



S 4 = 



60712 62 



7 



1323 



+ Y 7l + 3 7l ~ 3 72 - 



(8) 
(9) 



(10) 



(11) 



The dots on Fig. ^| assume 7* = 0, i > 2. While this is 
incorrect in principle for a scale dependent spectrum such 
as tCDM, the long dashes on the left-hand panels prove that 
the contribution of 72 is insignificant. Higher order ji terms, 
as discussed also by Baugh, Gaztanaga & Efstathiou (1995), 
have an even smaller effect and can be rightly neglected. 
PT predictions are accurately fulfilled in the weakly 
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Table 1. The scales for which we measured the CPDF 
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Figure 3. The cumulants 5jv = £jv/£ * as functions of £ compared to various theoretical models. The left-hand panels show the full 
dynamic range, while the right-hand ones concentrate on the transition to the non-linear regime. The models considered are perturbation 
theory (dots on all panels and long dashes on left panels), extended perturbation theory (short dashes), and one loop perturbation theory 
based on the spherical model (dots-long dashes on right panels). The upper and the lower panels give Sjv for 6 < N < 10 and 3 < N < 5 
respectively (the value of Spj increases with order TV). The convention for the symbols is the same as in Fig. ^. Note that the right-hand 
panels show only the measurements in the full simulation £. 



non-linear regime. This confirms again numerous earlier 
works (see, e.g. Juszkiewicz, Bouchet & Colombi 1993; 
Bernardeau 1994; Juszkiewicz et al. 1995; Gaztanaga & 
Baugh 1995; Baugh, Gaztanaga & Efstathiou 1995; SQSL). 
In fact the textbook quality agreement with PT demon- 
strates the accuracy of the rCDM Hubble Volume simu- 
lation. 

The dashes give the predictions obtained from extended 
perturbation theory (EPT, Colombi et al. 1997; see also Sza- 



pudi, Meiksin & Nichol 1996 for EPT applied to galaxy data, 
and Scoccimarro & Frieman, 1998 for "hyperextended" per- 
turbation theory). EPT assumes that the same forms of the 
higher order moments are preserved in the highly non-linear 
regime. There 71 above is simply an adjustable parameter 
without any particular meaning, i.e. 



Ti.cff = 7i (S3 ) = 53 



34 

y 



(12) 



where S3 is the measured one. With this value of 71 the Sn's, 
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N > 4, can be computed using equation (^) (with 7; = 0, i > 
2). The dashed curves matches the measurements quite well 
even in the highly non-linear regime thereby reconfirming 
the efficiency of EPT (see also SQSL). The agreement is 
not expected to be absolutely perfect from this Ansatz: on 
Fig. ^, EPT tends to underestimate slightly the measured 
values of Sn when 1 ^ £ ^ 10. 

The dynamic range in the upper left panel of Fig. ^| is 
narrower than in the lower left panel: on large scales the 
agreement between PT and measurement becomes less ac- 
curate for the Sjv's, especially if N is large. This might be 
related to transients due to the initial setup of the particles 
on a glass perturbed by using the Zel'dovich approximation. 
On the one hand, the transients related to pure Zel'dovich 
should decrease the value of the Sn's (e.g., Juszkiewicz et 
al. 1993 and Scoccimarro 1998) while, on the other hand, 
the anticorrelations due to the glass could have the oppo- 

— N — 1 — 

site effect by decreasing £ more than £ N . Although this 
problem was not examined in detail, the glass contamina- 
tion on £ appears to be inconsequential. Alternatively, finite 
volume effects can degrade the high-iV tail of the CPDF 
(e.g., Colombi, Bouchet & Schaeffer 1994; CBS; Colombi et 
al. 1996). In addition, it is worth reemphasising that two 
rightmost points are prone to errors caused by softening as 
discussed earlier. 

The right-hand panels of Fig. ^ zoom in on the tran- 
sition between the weakly and highly nonlinear regime. 
For comparison, PT (with 7; = 0, i > 2, dots), EPT 
(dashes), and the one loop perturbation theory of Fosalba 
& Gaztanaga (1998) (dots- long dashes) are displayed. The 
last model yields agreement with the extracted values of Sn 
for £ ^ 1, or even larger when the order N is high enough 
(see upper right panel). This affirms the success of one-loop 
perturbation theory (see also Lokas et al. 1996; Scoccimarro 
et al. 1998). Interestingly, EPT produces almost identical 
results to the spherical model when £ £ 1. 

Finally, figure ^ shows a — — ln(Po) /N as a function of 
scale, compared with EPT predictions. By definition (White 
1979; Balian & Schaeffer 1989a; see also Appendix A) 



-1 Sn 
AH 



(13) 



where N is the average count in a cell. This function is 
thus sensitive to low order statistics when iV c = N£ <C 1, 
and to high order statistics when N c ^> 1. According to 
Fig. ^, EPT is an accurate Ansatz on small scales where a 
is close to unity and is dominated by low order Sn- It is a 
less precise approximation on the largest scales probed, as 
expected. Indeed, the rightmost point of Fig. ^ corresponds 
to where £ ~ 1 in Fig. ^. There EPT increasingly under- 
estimates the Sn's when N is high. Note the remarkable 
power- law behavior of a oc r D ° , D ~ 0.25, in agreement 
with the predictions of the scaling model of Balian & Schaef- 
fer (1989a). This reflects a non-trivial (multi)fractal particle 
distribution (Balian & Schaeffer 1989b) with a Hausdorff di- 
mension Dq. Such behavior was found in a standard CDM 
model by Bouchet, Schaeffer & Davis (1991). Subsequently, 
the fractal distribution with Do — 0.5 was established by 
Colombi, Bouchet & Schaeffer (1992). 



7 




l (h _1 Mpc) 



Figure 4. The scaling function a = — ln(Po)/JV, compared with 
extended perturbation theory (dots). The convention for the sym- 
bols is the same as in Fig. 0. Note that on the largest scales we 
measure Po = 0, and thus no points are plotted. For the direct 
measurement in E there is no empty cell with t = 7.8 h~ 1 Mpc 
because of our insufficient sampling. 



4 THE COSMIC ERROR 

In the previous section we demonstrated that good agree- 
ment was obtained comparing measurements made on the 
rCDM Hubble Volume dataset with previous work regard- 
ing higher order clustering statistics. Having established the 
accuracy of the dataset this section concentrates on the 
the determination of cosmic errors and their comparison 
to the available theoretical predictions, where possible. In 
§ 4.1 we summarise analytic calculations of the cosmic er- 
rors and their cross-correlations. From this follows a sys- 
tematic study of the experimental cosmic error of low-order 
statistics, i.e. factorial moments Fk, 1 < k < 4 (§ 4.2), and 
cumulants £, S3 and 54 (§ 4.3) together with a thorough 
comparison with the theoretical predictions. Also in § 4.3 
we discuss the cosmic bias of the cumulants. Then the void 
probability and its scaling function a are explored (§ 4.4) fol- 
lowed by the cosmic error on the CPDF itself (§ 4.5). Finally, 
in § 4.6, there is a preliminary investigation of the cosmic 
cross-correlations of factorial moments and cumulants. 

In all subsequent figures, except for the cross- 
correlations, there are errorbars plotted on the symbols cor- 
responding to measurements due to the finite number of 
realizations Cg = 4096. These measurement errors, propor- 
tional to 1/y/Ce (SC), are negligible for our simulation, and 
the errorbars are smaller than the size of the symbols in 
most cases. As discussed in the Introduction, we neglect the 
cosmic error on the determination of the cosmic error (which 
is due to the finite size of the Hubble Volume itself) because 
in practice it is insignificant. 



4.1 Cosmic Error: Theoretical Predictions 

Before making any comparison with the analytic predictions, 
we outline the main ideas in CBS, SC, and SCB - more 
details can be found in these papers. Spherical cells of radius 
I are assumed throughout for simplicity. 
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The bivariate CPDF PN,Ai(£,r) is the probability of 
finding TV and M points in two cells of size I at distance 
r = \n — r%\ from each other. According to SC the cosmic 
error is computed via a double integral of Pn,m(£, r) over 
n, and r%, conveniently split according to whether the cells 
overlap or not: 

(i) overlapping cells (r 21): give rise to the discreteness 
and edge effect errors (see Introduction). The locally Pois- 
sonian assumption (CBS, SC) enables the approximate rep- 
resentation of the generating function P(x,y) for overlap- 
ping cells by using only the monovariate generating function 
P(x), i.e. the calculation depends on f, Sn, N > 3 and the 
average count TV. 

(ii) disjoint cells (r <; 21): generate the finite volume error 
(see Introduction). To simplify the writing of Pn,u(^,t), 
the distance r is assumed to be large enough compared to 
the cell size such that the bivariate CPDF can be Taylor 
expanded (to first order) in terms of £( r )/£- This approx- 
imation is surprisingly accurate even when the cells touch 
each other (Szapudi, Szalay, & Boschan 1992, Bernardeau 
1996, hereafter B96). Three models are used: two particular 
but still quite general forms of the hierarchical model, SS 
and BeS, introduced by Szapudi & Szalay (1993a, hereafter 
SSa, 1993b) and by Bernardeau & Schaeffer (1992), respec- 
tively, and perturbation theory, hereafter PT (B96). See SC 
and SCB for more details. The former two models depend 
only on monovariate statistics, i.e. on £ and Sn, TV > 3 and 
TV. PT on the other hand is expressed in terms of ji, £ and 
TV (B96). In principle, PT is accurate only in the weakly 
non- linear regime, for which it was originally designed, but 
it can be extended to the nonlinear regime as well: for mono- 
variate distributions, EPT was proposed by Colombi et al. 
(1997), as discussed and tested versus measurements in § 3. 
This Ansatz can actually be naturally generalized to the bi- 
variate CPDF (Szapudi & Szalay 1997, SCB). Our version, 
denoted by E 2 PT, takes the measured (non- linear) value for 
£,, 7i,off from equation ( [l2] ) and it assumes, as EPT, 7^ = 
for i > 2. 

Except for the error on the void probability and its scal- 
ing function a detailed in CBS, the theoretical results shown 
in this section were computed to leading order in v/V , where 
v is the cell volume and V = L 3 is the sample volume. 

The calculation of the error on a statistics of order k 
depends on TV = Fi, £, £(L), the average of the correlation 
function over the survey (see below), and Sjv, 3 < TV < 2k. 
PT is determined by y t , i < 2k - 2 (§ 3) and E 2 PT by 7i, cff 
as explained above. In all cases, we use the measured value 
of TV. Other parameters are chosen as follows: 

(a) PT: linear theory is employed to compute £ and 
[the catalog is assumed to be spherical to simplify the cal- 
culation of integral ( |l6| ) below] while higher order statistics 
are evaluated according to eq. ([]) with 7* = 0, i > 2. 

(b) Other models: : the experimental £ is used (open sym- 
bols on Fig. ^). The quantity £(£) is computed numerically 
with the non-linear Ansatz of PD discussed in § 3 (assuming 
that the catalog is spherical). For the Sjv's, the measure- 
ments (open symbols on left panels of Fig. H) are used for 
i < 15 Mpc. On larger scales, EPT is more appropriate 
to determine Sn, TV > 4: the increasing inaccuracy of the 
Sn's on large scales and for large TV require this procedure. 



It is justified all the more since when £ ^ 0.27 EPT matches 
quite well to the PT predictions (see Fig. ^). 

There is a subtlety worth mentioning which concerns the 
finite volume error, proportional to the integral 

?(L) = 4 / dPrufntdn-ral). (14) 

V Jr 12 >2i 

To leading order in v/V, this integral reads (CCDFS) 

=?„(£) -^li(2Q, (15) 

with 

?„(£) = i / d 3 nd 3 r 2 a\ri-r 2 \), (16) 

* Jri,r 2 eV 

f,(i) = - 4nr 2 Z(r)dr. (17) 
v Jr<e 

In the above equations, V corresponds to the volume covered 
by cells of volume v included in the catalog. 

The next to leading order correction, £ 1; can be identi- 
fied as a negligible correction to the edge effects for most 
practical purposes. Although it did not make a significant 
difference, we included this correction nonetheless. 

4.2 Cosmic Error: Factorial Moments 

Figure ^ presents the cosmic error measured for the factorial 
moments Fk, 1 < k < 4. By definition 

F k = ((TV) fe ) = (TV(TV-l) ■ • ■ (TV-fc+1)) = ^(TV) fe Piv.(18) 

JV 

The factorial moments directly estimate the moments of the 

k , 

underlying continuous density field: F k — TV (p *) where 
TV" = F\ is the average count (e.g., SSa). On Fig. @, the 
dotted, dash, long dash and dotted-long dash curves corre- 
spond to SS, BeS, E 2 PT and PT. 

All the models converge and agree quite well with the 
measurements on large scales t <; £0 ~ 7.1 h~~ x Mpc, as ex- 
pected, since PT predictions should be valid. In contrast, 
on small scales I < £0 the models overestimate slightly the 
numerically obtained error, E 2 PT being the most accurate. 
It is worth remembering that the leftmost two points may 
be contaminated by smoothing effects and should not be 
over-interpreted. Nevertheless, the decrease of precision on 
small scales suggests that our assumptions (i) or (ii) in § 4.1 
are becoming more and more approximate in the non-linear 
regime, i.e. either the local Poisson assumption or the par- 
ticular hierarchical decompositions loose their accuracy. To 
test this idea the contribution of overlapping cells (edge + 
discreteness effects) were separated from the contribution of 
disjoint cells (finite volume effects), as shown respectively as 
solid and dash-long dash curves on Figure |^, which concen- 
trates on E 2 PT (long dashes). Note that the solid curve rep- 
resents the SS and BeS models as well. Finite volume effects 
appear to dominate on small scales because our subsamples 
are dense enough to suppress discreteness error as expected 
(SC). This pinpoints assumption (ii) as the source of inac- 
curacy. Note that naively one would suspect additional loss 
of precision in the Taylor expansion of the bivariate CPDF. 
However, the finite volume error is a double integral over all 
the cells included in the catalog and separated by more than 
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I (h M Mpc) l (h M Mpc) 

Figure 5. The cosmic error AF^/F^ as a function of scale. Each Figure 6. Same as in Fig. |], but now the long dashed, dashed- 

panel corresponds to a value of k. The dots, dashes, long dashes, long dashed and solid curve correspond respectively to the E 2 PT 

dot-long dashes correspond respectively to the SS, BeS, E 2 PT and model, the finite volume contribution, and the edge + discrcte- 

PT models. PT is shown only in its expected range of validity, ness contribution. Note the sudden cut-off at large scales for 

£ Si io, where £o is the correlation length defined by £(lo) = 1. the finite volume error, in agreement with eq. (|lH). Without the 

For k = 1, all the models give the same result. As discussed in 8(v/V)£i(2£) correction, the cut-off would not show up, but this 

the beginning of § 4, there are errorbars due to the finite number would not significantly change the total error, 
of realizations Cg = 4096, but they are so small that they do not 
show. 
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11. The contribution of close cells is small, especially when 
l/L is small. Thus E 2 PT itself appears to break down in 
the non-linear regime (SS and BeS are even less accurate), 
at least for the particular experiment we are analysing. De- 
spite that EPT itself fares quite well (Fig. ||), its simplest 
natural extension to bivariate distributions, E 2 PT, is less 
accurate, as noticed earlier by Szapudi & Szalay (1997) in 
connection with the cumulant correlators of the APM galaxy 
catalog. However, the accuracy of the calculation based on 
E 2 PT should be adequate for most practical uses, and fu- 
ture work on the representation of the bivariate distribution 
in the highly non-linear regime will result in increased pre- 
cision. 

The solid curves in Fig. |^ represent the main contribu- 
tion of the cosmic error on large scales. Here, as expected 
(SC), the cosmic error is dominated by edge effects. De- 
spite the fact that theoretical predictions were determined 
to leading order in v/V and the largest scale considered is 
£ — L/1, i.e. v/V — 1/8, the agreement between theory 
and measurement is surprisingly good. CCDFS have com- 
puted the next to leading order contribution proportional to 
the perimeter dV of the survey. With this correction, which 
increases the cosmic error especially on the largest scales, 
next to leading order theory would be inferior to the lead- 
ing order one. The reason is that the calculation of CCDFS 
assumes a perimetric curvature radius much larger than the 
cell size. This assumption, which is useful for deep galaxy 
surveys with small sky coverage, obviously fails for a com- 
pact catalogue such as this one where the cell size £ becomes 
comparable to L. 

4.3 Cosmic Error and Cosmic Bias: Variance and 
Cumulants 

So far only the full moments Fk have been examined. The 
cumulants £ and Sat, however, are the more physically mo- 
tivated quantities. But the statistics of these is complicated 
by the fact that they are ratios. For example (see Appendix 
A) 

f = F 2 /F? - 1. (19) 

As is well known in statistics (e.g., HG, SCB) (A/B) ^ 
(A) /(B). In other words, the estimator 

f = Fa/A 2 - 1 (20) 

is biased. Note that this is a general feature for any statistic 
constructed from unbiased estimators in a non-linear fash- 
ion (e.g. SCB). However, SCB showed theoretically that the 
cosmic bias defined in the Introduction, given here by 

bj = ((f) -?)/?, (21) 

is of same order of (A£/£) 2 in the regime A£/£ <C 1. Similar 
reasoning applies to the Sjv's. Thus leading order theoret- 
ical calculations neglect the bias. This can be done safely 
in the domain of validity of the perturbative approach used 
to expand a non-linear combination of biased estimators. A 
reasonable criterion proposed by SCB for this domain is that 
the cosmic bias be small compared to the relative cosmic er- 
ror which itself should be small compared to unity. For an 
arbitrary (possibly biased) statistic A this reads 

b A < AA/A < 1. (22) 



Left panels of Figure [jj are analogous to Fig. |5| and show 
the measured cosmic error as a function of scale for the bi- 
ased estimators of £, S3 and 5*4. The middle panels show the 
absolute value of the cosmic bias (open symbols) compared 
to the cosmic error (filled symbols). For additional clarity, 
the cosmic bias is plotted in linear coordinates as well in the 
right-hand panels. 

It is interesting first to compare the cosmic error for 
factorial moments and cumulants of same order. The dis- 
creteness error is negligible for the scaling regime and the 
statistics considered here. The cumulants fare better/worse 
than the factorial moments in the non- linear/weakly non- 
linear regimes, respectively. The finite volume error, dom- 
inating on small scales, is the limiting factor for factorial 
moments, while the edge effect error, dominating on large 
scales, drives the errors of the cumulants. This is in full ac- 
cord with the predictions of SCB which can be consulted for 
more details. 

The theoretical models on Fig. ^use the analytic calcu- 
lations of SCB and are computed analogously to Fig. as 
explained in § 4.1. E 2 PT only is presented in the middle and 
right-hand panels. Again, it is worth remembering that the 
leftmost points are dangerously close to the limit of possible 
contamination from artificial smoothing effects introduced 
by the force softening. 

For the variance £, the theory systematically overesti- 
mates the errors and the cosmic bias, except for the latter 
on large scales. This is not at all unexpected in light of the 
previous findings on small scales, where the three models 
SS, BeS and E 2 PT loose precision. In the weakly non-linear 
regime, £ > £0 = 7.1 h~ x Mpc, where perturbation theory 
is valid, this is somewhat disappointing. However, the dy- 
namic range is limited by the criterion (E2j) , which is hardly, 
if at all, fulfilled here. Hence the leading order perturbative 
approach is likely to be insufficient. 

For higher order statistics S3 and 5*4, the theory again 
tends to overestimate the amplitude of the measured cosmic 
bias on small scales. On large scales, where the predicted 
\bs h I presents a sudden turn-up, condition ( p2| ) breaks down, 
thus the theory is inapplicable. The measured cosmic errors, 
on the other hand, are in accord with the theory within 
the range of its validity. The agreement on small scales is 
even better for AS fe /S fc than for AF fe /F fc , k = 3,4. This, 
however, should not be over-interpreted, as it is probably 
a coincidence due to cancellation effects of the ratios S3 = 
l 3 /f andS4 = I 4 /f. 

The cosmic bias is always negative (right-hand panels 
of Fig. Q), i.e. the biased estimators tend to underestimate 
real values (SCB; HG). In this particular experiment, the 
measured cosmic bias is always dominated by the measured 
cosmic error as predicted by the perturbative approach, ex- 
cept for the largest scales. Here the cosmic bias can become 
of same order as the cosmic error. HG suggested that the 
cosmic bias should be corrected for when measuring cumu- 
lants. Whether this makes sense depends on the magnitude 
of the cosmic skewness, i.e., the skewness of the cosmic dis- 
tribution function itself. This will be discussed in more detail 
by paper II. However, it is worth noting that function T(A) 
is positively skewed and that its maximum corresponds to 
the most likely measurement. This is in general smaller than 
the average, (A). Thus, as pointed out already by SC, the 
measured value A in a finite sample is likely to underesti- 
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Figure 7. Same as Fig. g for the average correlation function (top row of panels), and the cumulants S3 (middle row of panels) and 
S4 (lower row of panels). The cosmic bias is plotted both in logarithmic coordinates (middle column of panels) and linear coordinates 
(right column of panels). The filled and the open symbols correspond to the cosmic error and the cosmic bias respectively. The theory 
breaks down on large scales for 54 shown in the bottom left-hand panel. In this regime, the leading order calculation gives negative 
(AS4/S4) 2 (see SCB). The theory result for the cosmic bias is shown for the E 2 PT model only. In the middle column panels, there are 
two long-dashed curves: each one of which should be compared with the closest symbols overall, corresponding either to the cosmic error 
(filled) or the cosmic bias (open). 



mate the real value A even if A is unbiased. If the cosmic 
skewness and/or the cosmic variance are large compared to 
the cosmic bias, it is pointless to correct for the cosmic bias. 
Either of the above is true for most surveys, including the 
upcoming wide-field surveys such as the 2dF and SDSS, thus 
bias-corrected estimators are unlikely to be useful in the fu- 
ture. 

4.4 Cosmic Error and Cosmic Bias: Void 
Probability and Scaling Function 

Upper panel of Fig. ^| shows APq /Pq as a function of scale 
compared to the prediction of CBS (long dashes), with the 
finite volume error contribution (dashes-long dashes) and 
with the edge + discreteness contribution (solid curve) . The 
agreement between theory and prediction is excellent. 

The lower panel of Figure N corresponds to the scaling 
function a. As for £ and Sn, the indicator a = — \n(Po)/N is 



biased. This bias (open symbols) is of order (Aa/a) 2 and can 
be neglected]^] The agreement between theory and measure- 
ment is less impressive than for Po, but this is mostly due 
to the difference of dynamic range covered by the error in 
upper and lower panels of Fig. |^. Moreover, the calculation 
of Aa/a by CBS is only approximate and could certainly be 
improved (see the discussion in CBS) . 

The errorbars about a are quite small: nearly an or- 
der of magnitude smaller than in Figs. |^ and (?]. According 
to equation (|IJ]), a reflects the low order statistics when 
iV c = iV| < 1 (cr 1 in Fig. |^) and the high order statis- 
tics when TVc 2> 1 (a < 1). From the point of view of the 
errors, function a is an excellent higher order indicator (as 
discussed earlier by CBS); it is better than the low order 



§ The theoretical and measured errors displayed on bottom part 
of Fig. W correspond to the biased indicator. 
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Figure 8. The cosmic error of the void probability function Pq 
(upper panel) and on the scaling function a = — ln(Po)/N (lower 
panel). The measurements (filled symbols) are compared with the 
theoretical predictions of CBS (long dashes). The finite volume er- 
ror contribution is drawn with short dash-long dash, and the edge 
+ discreteness effects contribution with solid lines. The available 
scaling range is limited by the fact that on large scales the mea- 
sured void probability is zero. For I ~ 7.8 /i" 1 Mpc (upper right 
point on upper panel), the void probability cancels from time to 
time in the subsamples £i. As a result, it is possible to compute 
the unbiased function a but the estimated cosmic error on the bi- 
ased estimator a is infinite. The open symbols in the lower panel 
correspond to the measured cosmic bias in <r. It is positive and 
much smaller than the cosmic error. It can be neglected for all 
the relevant dynamic range in the experiment considered here. 



factorial moments or cumulants, at least in the non-linear 
regime I £q. This fact alone unfortunately does not guar- 
antee the usefulness of this statistic as various models of 
large scale structure formation could be degenerate with re- 
spect to the void probability. The thorough work of Little 
& Weinberg (1994) suggests that this is indeed the case. It 
is tempting although dangerous to extrapolate the results of 
their analysis to the function a. 

4.5 Cosmic Error: Counts-in-Cells 

Upper panel of Fig. ^| shows the cosmic error in the CPDF as 
a function of N for the various scales considered in Si. The 
scale increases with the ^-coordinate of the upper right part 
of each curve. In the lower panel APjv/Pjv is represented in 




0.001 



100 1000 

N 



10* 



10° 



2 

Oh 

\ 
2 

< 



100 
10 

1 

0.1 
0.01 t 
0.001 



- 1 — I I I 1 1 ill 1 — I I 1 1 1 ill 1 — r 




mL 



0.01 0.1 



10 



100 1000 



N/N 



Figure 9. The cosmic error APjv/Pjv in the CPDF as a function 
of JV (upper panel) and as a function of N/N m ax, where iV mMC is 
the value of N for which Pn is maximum (lower panel). In the 
lower panel, only the scales large enough so that ./V max > 1 are 
displayed. 



a similar manner as a function of N/N-max where iV max is the 
value of iV for which Pjv is a maximum. [We did not display 
the (small) scales corresponding to iV m ax = or A^ max = 1]. 
In agreement with intuition, the cosmic error reaches its 
minimum in the vicinity of iV ~ iV max and becomes increas- 
ingly large in the tails. Thus the shape of the CPDF near 
its maximum has the most power to constrain in terms of 
errors. Kim & Strauss (1998) have measured the cumulants 
5*3 and Si by fitting an Edgeworth expansion convolved with 
a Poisson distribution to the measured CPDF in the 1.2 Jy 
IRAS galaxy catalog. According to their recipe, the best de- 
termined part of the CPDF near the maximum was kept for 
the fit. Their maximum likelihood approach uses a simple 
model for the cosmic error, but their method is promising. 
Its main weakness is the necessity to make a strong prior as- 
sumption for the shape of the CPDF. A natural consequence 
is that the estimated errorbars on the measured cumulants 
are considerably smaller than with the standard methods. 
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4.6 Cosmic Correlations 

So far this section has dealt only with the second moment of 
the cosmic distribution function, i.e. with the cosmic errors. 
For a full description in the Gaussian limit, however, the 
moments of the joint distribution function are needed. These 
moments form the cosmic (cross-correlation) matrix (SCB). 
It is defined as {(A— (A))(B — (B))) , where A and B axe any 
counts- in-cells related indicators, for example A = Fk (£) and 
B = Fy{l'), or A = 1(1) and B = S N {£'), etc. A detailed 
theoretical investigation can be found in SCB (for £ = £'). 
By definition, for two statistics A and B, the correlation 
coefficient — 1 < p < 1 reads as 

= (SASB) = ((A-A)(B-B)) 
P ~ AAAB ~ AAAB ' 1 ' 

The cosmic cross-correlation coefficient together with the 
errors form the full correlation matrix. The inverse of this 
is the central quantity for the joint probability distribution 
function in the Gaussian limit. As a preliminary numerical 
analysis, Figs. ^ and O present the correlation coefficients 
as functions of scale (£ — £) for factorial moments and cu- 
mulants, respectively. As in Fig. ^, the dots, dashes and long 
dashes show the theoretical predictions given by the SS, BeS 
and E 2 PT models, respectively, as computed by SCB. The 
computation of (8A8B) in eq. (^) is analogous to that of 
the cosmic error (see SCB for more details). [For A A and 
AB, and to have completely self-consistent calculations, we 
take the theoretical results as well in eq. (0)]. 

The agreement between theory and measurement is less 
convincing for the cosmic cross-correlations than for the 
cosmic error. This appearance is due partly to the linear 
coordinates of the figures which emphasize deviations, but 
nonetheless real. 

On Fig. [h^ there is a significant discord between the- 
ory and measurements for factorial moments in the middle 
top, middle bottom, and top right panels. On small scales, 
this result is quite natural: it is probably due to the inaccu- 
racy of the models SS, BeS and E 2 PT employed to describe 
the underlying bivariate distributions (§ 4.2). In the weakly 
nonlinear regime, this discrepancy is apparently puzzling, 
since the predicted cosmic error matches perfectly the mea- 
surements (Fig. ^). The disagreement increases with \k — l\, 
where k and / are the corresponding orders. On large scales, 
the cross-correlations are dominated by edge effects leading 
to the suspicion that the local Poisson approximation (SC, 

§ 4.1) is becoming increasingly inaccurate with \k — I 1-0 An- 
other although less likely possibility, is that the leading order 
approach in v/V is insufficient and higher order corrections 
are necessary to calculate cross-correlations. It would go be- 
yond the scope of this paper to analyse in detail these effects 
which are left for future research. 

For the cumulants, in addition to the above arguments, 
our perturbative approach to compute cross-correlation al- 
lows only a narrow dynamic range for analytic predictions, 
defined by criterion (f22j). In Fig. hll this condition is chosen 
for practical purpose to be 

^ This is not surprising: this approximation neglects local corre- 
lations. This is all the more inaccurate as the difference between 
the "weights" given to two overlapping cells, i.e. (N)k and (N)[ 
for factorial moments, increases. 



|&a|<AA/A<1. (24) 

This is necessary but not sufficient: the theory appears to 
disagree significantly with the measurements on large scales 
at the top left, lower left and lower middle panels of Fig. |ll| 
Despite some of the discrepancies, the general features 
of the cross-correlations are well described by the theoretical 
predictions. For instance the cross-correlation between two 
statistics Ak and Ai decreases with the difference between 
the orders \k — 1\ as predicted (SCB). In our particular exper- 
iment N is significantly correlated with £, but only weakly 
(anticorrelated) with Sk, k = 3,4. Similarly, £ and S3 are 
weakly, but S3 and S4 are strongly correlated. A detailed 
discussion on these effects can be found in SCB. 



5 SUMMARY AND DISCUSSION 

In this paper we have studied experimentally the proper- 
ties of the moments of the cosmic distribution function of 
measurements T(A), where A is an indicator of a counts-in- 
cells statistic. For a thorough examination of T(A) itself the 
reader is referred to paper II also in this volume. 

We examined the factorial moments Fk, the cumulants 
£ and Sjv's, the void probability Po, its scaling function, 
a = — ln(Po)/-Fi, an d the count-in-cells themselves Pjv- 
T(A) was measured in the largest available rCDM simula- 
tion divided into 4096 cubical subsamples. In each of these 
many subsamples, A was extracted, and its probability dis- 
tribution function T was estimated with great accuracy. The 
main results of our analysis are the following: 

(i) The measured count-in-cells in the whole simulation, 
in particular the cumulants Sn, are in excellent agreement 
with perturbation theory predictions in the weakly nonlinear 
regime. This confirms the results of numerous previous inves- 
tigations in an unprecedented dynamic range. The textbook 
quality agreement demonstrates the state of the art accu- 
racy of the simulation. Similarly, the measurements confirm 
extended perturbation theory (EPT) in the full available dy- 
namic range 0.05 ^ £ ^ 50, for Sn, N < 10. In addition one 
loop perturbation theory predictions based on the spheri- 
cal model (Fosalba & Gaztafiaga 1998) were found to be an 
excellent description of the measured Sn up to a £ 1. 

(ii) The variance of T is the square of the expected cosmic 
error, AA, in the measurement of A in a subsample, iden- 
tified with a realization of the local observed universe. The 
measurement of AA/A, for A = Po, a, Fk and Sn appears 
to be globally in good accord with the theoretical predic- 
tions of Colombi, Bouchet & Schaeffer (1995), Szapudi & 
Colombi (1996, SC) and Szapudi, Bernardeau & Colombi 
(1998a, SCB). 

In the highly non-linear regime, the theoretical predic- 
tions of SC and SCB tend to overestimate the cosmic error 
slightly, except for the ratios S3 = £ 3 /£ and Sa = £ 4 /£ . 
In the latter case, there are some cancellations and the 
agreement between theory and measurement is good even 
on small scales, but this is probably a coincidence. It ap- 
pears thus that none of the three variants of the hierarchical 
model in SC and SCB, can give an accurate enough account 
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Figure 10. The measured cosmic cross correlation coefficients of the factorial moments (symbols) are compared with the models SS 
(dots), BeS (dashes), and E 2 PT (long dashes). 
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Figure 11. Same as in Fig. |10| but for the cosmic cross correlation coefficients of the cumulants. The dynamic range for the theory is 
restrained by the condition (E4j). 



of the non-linear behavior of gravitational dynamics for the 
bivariate distribution functions. [J] 

In the weakly non-linear regime, agreement between the- 

II As discussed in § 4.1., the analysis of the cosmic error indirectly 
probes the bivariate probability distribution function Ppf M( r s£) 
of having N and M galaxies respectively in two cells of size I 
separated by distance r (see, e.g. SC). 



ory and predictions is excellent for the factorial moments, 
but less good for the cumulants, due to the limitations of 
the perturbative approach used to expand such ratios. 

Nonetheless EPT yields the most precise overall agree- 
ment with theory for our particular experiment. On small 
scales 1 h' 1 Mpc <, £ <, 4 h' 1 Mpc, EPT overestimates the 
errors perhaps by a factor of two in the worst case. 

(iii) In addition to the cosmic errors, the cosmic bias, 
6a, was studied in detail as well. An estimator is biased 
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when its ensemble average is different from the real value: 
6a = (A) /A — 1 7^ 0. This is always the case when unbiased 
estimators are combined in a non-linear fashion to form a 
new estimator (SCB, Hui & Gaztahaga 1998, HG), such as 
the cumulants. 

In agreement with SCB, the measured cosmic bias is of 
order (A.A/A) 2 and thus negligible when the cosmic error is 
small. However, as for the errors, the theory tends to over- 
estimate the bias in the non-linear regime. On large scales, 
where the cosmic bias becomes significant because of edge 
effects, the perturbative approach used by SBC to compute 
theoretical predictions is then outside of its domain of va- 
lidity. 

Note that in the regime where the cosmic bias is signif- 
icant, the cosmic error is likely to be large. For instance, 
in the particular numerical experiment used in this paper, 
the cosmic bias was always smaller than the cosmic errors 
and in most cases negligible. Moreover, in the regime where 
the bias could be significant, the cosmic distribution func- 
tion T(A) is significantly positively skewed (paper II). This 
implies that the measured A is likely to underestimate the 
true value even for an unbiased estimator. The result is an 
effective cosmic bias, at most of order AA/A. As already 
shown by SC, this effective bias can contaminate even unbi- 
ased estimators such as Fk and Pn- As a consequence, it is 
pointless correcting for the cosmic bias, in contrast with the 
proposition of HG, unless it is done in the framework of a 
maximum likelihood approach which takes into account fully 
the effects of the shape of the cosmic distribution function. 

(iv) To complete the analysis of second moments, a pre- 
liminary investigation of the cosmic correlation coefficients 
for factorial moments and cumulants was conducted. To- 
gether with the cosmic errors, these coefficients form the 
cosmic cross-correlation matrix which underlies maximum 
likelihood analysis in the Gaussian limit. 

Theoretical predictions of SBC give good qualitative ac- 
count of the measured correlation coefficients, although they 
become increasingly approximate with the difference be- 
tween the corresponding orders. This is likely to be a con- 
sequence of the local Poisson assumption (SC) employed for 
analytic predictions. 

Provided that the Gaussian limit is reached in terms of 
the error distribution, the formalism of SC and SBC allows 
for a maximum likelihood analysis of the CPDF measured in 
three-dimensional galaxy catalogs. Two preliminary investi- 
gations are currently being undertaken. Szapudi, Colombi 
& Bernardeau (1999b) reanalyse already existing joint mea- 
surements of Fi and £, and Bouchet, Colombi & Szapudi 
(1999) perform a likelihood analysis of the count-in-cells 
measured in the 1.2Jy IRAS survey (Bouchet et al. 1993). 

Paper II probes the domain of the Gaussian approxi- 
mation for the cosmic distribution function, together with 
preliminary investigations for the bivariate cosmic distribu- 
tions T(A, B). As shown there, the Gaussian limit is reached 
when the relative cosmic error is small compared to unity. 
This is expected to hold for a large dynamic range in future 
large galaxy surveys such as the 2dF and the SDSS (Colombi 
et al. 1 998). 



1999). As a result, slight modification of the formalism of SC 
and SCB is fruitful to compute theoretical cosmic errors and 
cross-correlations (Bernardeau, Colombi & Szapudi, 1999). 

Finally, it is worth to mention a few questions which 
were not addressed so far by the investigations presented in 
this paper. As light might not trace mass, the distribution 
of galaxies may be biased (not to be confused with the cos- 
mic bias), and also realistic galaxy surveys are subject to 
redshift distortion. While the above results were obtained 
for the mass, note that the theory which served as a basis of 
comparison is quite general and was formulated to describe 
phenomenologically either the mass or the galaxies. It ap- 
pears that there should be no qualitative changes introduced 
by biasing or redshift distortions (e.g., Szapudi et al., 1999e), 
thus the same theory can be used for the galaxies as for the 
mass, except perhaps with slightly different parameters, or 
underlying statistical models. In fact, two of the models (SS, 
BeS) were entirely motivated by the galaxy and not by the 
mass distribution; they are expected to be more accurate 
for realistic catalogs if used in a self-consistent fashion. The 
scaling properties underlying these models is even more ac- 
curate in redshift space, as is well known. EPT, on the other 
hand, was originally motivated by theoretical considerations 
of the mass distribution and numerical simulations (Colombi 
et al. 1997) , and therefore it is no wonder that it is the most 
successful model for the mass (but see also Scoccimarro & 
Frieman 1998). Nonetheless, even EPT was found to be a 
fairly good model for the galaxy distribution, at least in the 
EDSGC survey (Szapudi, Meiksin & Nichol 1996), a possi- 
ble indication that galaxies approximately trace mass after 
all. In addition, it is worth mentioning that biasing models 
can be nondeterministic, i.e. stochastic in nature, but this 
again does not introduce anything new qualitatively which 
could not be handled in the framework of the theory of SCB. 
Finally, the theory outlined in this paper was contrasted 
against measurements in a rCDM simulation. However, the 
analytical framework is general enough to accommodate any 
cosmological model, and there are no qualitative differences 
in this respect between different cosmologies with Gaussian 
initial conditions and hierarchical clustering. Thus repeat- 
ing the same analysis for a different CDM-like cosmogony 
would be superfluous and inconsequential. 
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APPENDIX A: 
NOTATIONS 



DEFINITIONS AND 



The count probability distribution function (CPDF) Pn, 
gives the probability of finding N objects in a cell of vol- 
ume v thrown at random in the catalog. 

Factorial moments, Fk, are defined as follows 

F k = ((N) k ) = (N(N-l) ■ ■ ■ (N-k+1)} = ^WkPN, (Al) 



where the falling factorial (N)k is defined in the first part 

of the equation. The Fk are proportional to the moments 

of the underlying density field p smoothed over the cell of 
k , 

volume v: Fk = N (p k ) (SS a; assuming the normalization 
(p) = 1), where iV is the average count in a cell: 

N=(N} = Fi. (A2) 

Counts-in-cells are related to quantities of dynamical 
interest, such as the (connected) Appoint correlation func- 
tions, £jv (e.g., Peebles 1980). The averaged Appoint corre- 
lation function over a cell is given by 



j3 

a n 



■ d rjv£(ri, . . . ,r N ). 



(A3) 



This is the connected moment of the smoothed density field, 
Cjv = (^ JV ) c (with 5 = p — 1). The connected moments or 
cumulants of a Gaussian field are identically zero for N > 3. 
In this paper, normalized cumulants are defined as 



Sn 



£ 



(A4) 



with the short-hand notation £ = £ 2 . By definition, Si = 
5*2 = 1, thus for second order £ is used. 

The quantities S3 and S4 are often called skewness and 
kurtosis in the astrophysical literature, although their def- 
inition differs slightly from the origi nal usage in statistics. 
The reason for normalization in eq. (A4) is dynamical. The 
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Sn's exhibit a weak scale dependence only due to the scale- 
free nature of gravity. In the highly nonlinear regime sta- 
ble clustering is expected to set in, (e.g., Peebles 1980) and 
in the weakly nonlinear regime perturbation theory pre- 
dicts approximate scaling depending on the initial fluctua- 
tion spectrum (e.g., Juszkiewicz, Bouchet & Colombi 1993; 
Bernardeau 1994). 

The counts-in-cells generating function, 

oo 

P(x) = x N P N , (A5) 

JV=0 

writes (White 1979; Balian & Schaeffer 1989a; SSa) 

P{x) = exp{-7V(l - x)a[N c (l - x)}}, (A6) 

where 

No = N | (A7) 

is the typical number of objects in an overdense cell (e.g., 
Balian & Schaeffer 1989a), and 

oo 
JV=1 

It is worth noticing that (White 1979; Balian & Schaeffer 
1989a; SSa) 

P{x) = P [N{l-x)], (A9) 

if the void probability is expressed in terms of average counts 
N. The measurement of Po is particularly interesting since 
it probes directly the count probability generating function: 

a(Nc) = - ln(P )/N. (A10) 

The exponential generating function for factorial mo- 
ments, 

F ^ = J2 F ^i ( A11 ) 

fe>0 

is directly related to P(x) (SSa) through 

F(x) = P(x + l). (A12) 



Combining eqs. (A6), (A8) and (A12), one can obtain a use- 
ful relation between cumulants and factorial moments (SSa) 



fc=i v ' 

The state of the art practical recipe consists of measuring 
the CDPF with high oversampling (Sect. 3), computing the 
factorial moments from eq. (Al), and fin ally c alculating the 
cumulants from the above recursion eq. ( jAlSj ) . This proce- 
dure eliminates the need for explicit discreteness correction. 

This paper has been produced using the Royal Astronomical 
Society/Blackwell Science KTjrjX style file. 
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